ABSTRACT
In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a Chinese-RoBERTa-CRF model for relation extraction in Chinese medical texts. Experimental results test on the CMeIE dataset achieves the best performance compared to existing methods. And the best F1 value obtained between different sampling strategies is 55.96%.
Subject(s)
Problem-Based Learning , Supervised Machine Learning , Language , China , Reference Books, MedicalABSTRACT
Traditional Chinese medicine (TCM) has a long history of serving the Chinese people's health since its birth, including playing an important role in treating and preventing COVID-19 in 2020. The fact that TCM has been used in China for thousands of years shows the value and reason why it must exist. Although TCM has been or is being questioned, there is no doubt about its importance in terms of efficacy. This article focuses on how TCM understands the human body in comparison with anatomy knowledge in western medicine and discusses the development and advances of TCM in terms of the body view and the theory innovation. The purpose is to let foreign scholars get better understanding of TCM from this perspective.